-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exclude fsdp from delay_optimizer_creation #34140
exclude fsdp from delay_optimizer_creation #34140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :) Can we add a test in tests/test_trainer.py
? We can set env variables to configure Accelerate properly (ACCELERATE_MIXED_PRECISION="fp8"
will auto-use TE)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The required tests are distributed tests. We need to verify FSDP functionality with and without FP8 mixed precision. The appropriate test file might be |
Is the |
@eljandoubi we can't run them on the normal CI since GPU runners are not part of PR's. Instead when ready I'll pull the PR down and run it myself |
@muellerzr Thank you for the information. I have tested the branch in my code on a multi-node, multi-GPU setup using FSDP mode, both with and without FP8 mixed precision, and it worked as expected. Please let me know if you encounter any issues on your end. |
* Fix FSDP Initialization for resume training * Added init_fsdp function to work with dummy values * Fix FSDP initialization for resuming training * Added CUDA decorator for tests * Added torch_gpu decorator to FSDP tests * Fixup for failing code quality tests
@muellerzr Any updates regarding this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Can you do pip install -e .[quality]
followed by make fixup
? I'll then pull it locally to test on my 4090 system and we should be set!
@muellerzr I have done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry we'll merge as is, failing tests are unrelated!
* exclude fsdp from delay_optimizer_creation * add test case for trainer: FSDP mode and fp8 as mixed precision * rearrange imports * ruff formatted * adapt _init_fsdp to fp8 * use _init_fsdp only when resume_from_checkpoint * In case of FDP, self.layer will be CheckpointWrapper which has no len() method * delete _init_fsdp * solve conflict * fix conflict * make fixup
* exclude fsdp from delay_optimizer_creation * add test case for trainer: FSDP mode and fp8 as mixed precision * rearrange imports * ruff formatted * adapt _init_fsdp to fp8 * use _init_fsdp only when resume_from_checkpoint * In case of FDP, self.layer will be CheckpointWrapper which has no len() method * delete _init_fsdp * solve conflict * fix conflict * make fixup
What does this PR do?
It passes the model and the optimizer to accelerate.prepare in order to enable fp8 mixed precision, if any.
Fixes #34024
Who can review?
Library:
-->